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An efficient and relatively fast algorithm for the detection of communities in complex networks is 
introduced. The method exploits spectral properties of the graph Laplacian matrix combined with 
hierarchical-clustering techniques, and includes a procedure to maximize the "modularity" of the 
output. Its performance is compared with that of other existing methods, as applied to different 
well-known instances of complex networks with a community-structure, both computer-generated 
and from the real-world. Our results are in all the tested cases, at least as good as the best ones 
obtained with any other methods, and faster in most of the cases than methods providing similar- 
quality results. This converts the algorithm in a valuable computational tool for detecting and 
analyzing communities and modular structures in complex networks. 
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INTRODUCTION 

The outburst of activity in the field of Complex Net- 
works in recent years has been rather spectacular and 
amazing. Networks of any thinkable (and sometimes "un- 
thinkable" ) type, including social, biological and techno- 
logical ones have been described, and their topological 
as well as dynamical features studied. A whole line of 
research has emerged and a new perspective to tackle 
complex problems created. See Q, H, IE ^ HI for reviews 
from different perspectives and for exhaustive lists of ref- 
erences. 

One particular aspect, which has drawn much atten- 
tion, is the existence of subsets of nodes highly linked 
among themselves but loosely connected to the rest of the 
network, i.e. communities. These are believed to play a 
central role in the functional properties of complex struc- 
tures [E 0] . Identifying communities and analyzing their 
nature is an important task in some fields as, for instance, 
computer science jS_P|, sociology 0, 0|, biochemistry 
bibliometrics taxonomv. or, as a more specific 
instance, in the development of efficient search-engines 
for the WWW. According to Flake et al. 0, "as the web 
is self-organized into communities, search-engines imple- 
menting such a concept, would help surfers to find what 
they look for and avoid other contents" . 

The concept of "community" may be retained as rather 
vague and phenomenological. Indeed, depending on the 
network under scrutiny, it might be quite an artificial one, 
while, in other cases, it emerges as a very natural and 
useful structure-analysis tool. A way to make the con- 
cept more clear-cut and practical is through the definition 
of the modularity, Q (see below and a quan- 

tity which provides a way to quantify the community- 
structure of a given network. Other quantities have been 
proposed with the same purpose 
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The problem of finding communities is not new and is 
closely related to the problem of graph-partitioni ng^ pro- 
fusely studied in the context of computer science |18lll9l| . 
A review of some used techniques, including further ref- 
erences, can be found in IHH^- Related problems are 
image processing and pattern recognition, or more generi- 
cally data-clustering: in these cases there is no underlying 
network, but instead some relation or similarity between 
existing elements can be estabilished |^ . 

In recent years many algorithms for detecting com- 
munities have been proposed, starting with the seminal 
work by Girvan and Newman 0, . These authors pro- 
posed an iterative, divisive (as opposed to agglomerative) 
method based on the progressive removal of links with the 
largest betweenness, a quantity proportional to the num- 
ber of shortest paths passing through a given edge [23 |. 
The edges (or links) with the largest betweenness have 
the most prominent role in connecting different parts of 
the graph and, therefore, by removing them recursively 
a good separation of the network into its components 
or communities can be found. This method generates 
very good results and has been employed by different 
authors in studies of various kinds 25]. Unluckily, as 
already pointed out by the authors themselves, it has 
a main disadvantage: its computational demand is very 
high. For instance, for sparse networks with N nodes, the 
computation-time grows like N^. In order to deal with 
large networks, for which the previous algorithm turns 
out to be not viable, Newman himself developed a faster 
method (of the order N"^). It is based on the iterative 
agglomeration of small communities, starting from iso- 
lated nodes, by locally optimizing the modularity. This 
method generates worse results ,22j than the previous 
one. 

Some alternative algorithms both divisive and ag- 
glomerative (which we do not attempt to exhaustively 
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overview here) have been proposed in the last months. 
Some of them are Usted here in chronological order (see 
[ioj for a more critical discussion of some of them): 

• The method by Radicchi-Castellano-Cecconi- 
Loreto-Parisi is of order iV^. It is a divisive 
algorithm that works nicely whenever triangular 
(or higher order) loops are present in the network. 

• Wu-Huberman algorithm 27,]. It is a fast method 
(linear in N), based on the idea of voltage drops, 
which visualizes the network as an electric circuit. 
It can be used to locate the community to which 
one specific node belongs, but it requires successive 
iterations of the method in order to provide a global 
network division in communities. 

• Reichardt-Bornholdt method ,2^]. In this recent 
paper the authors introduce an algorithm inspired 
in the celebrated super-paramagnetic clustering al- 
gorithm devised by Blatt, Wiseman and Domany 
[221 . It is based on a g-state Potts Hamiltonian, 
and allows, for the first time, for the identification 
of fuzzy communities. 

• Capocci-Servedio-Colaiori-Caldarelli method [30| . 
This algorithm combines the use of spectral prop- 
erties (which are nicely reviewed and generalized to 
study different types of networks as, for instance, 
directed ones) with the use of correlation measure- 
ments to determine community closeness. 

• Fortunato-Latora-Marchiori method [l^l • This is a 
variation of the method by Girvan and Newman, in 
which the betweenness is substituted by the alter- 
native concept of information centrality, as a way 
to measure edge-centrality. The method generates 
good results but its performance {N'^ for a sparse 
graph) is rather poor. 

Apart from these techniques recently introduced in the 
field of complex networks, many other algorithms have 
been developed mainly in the context of computer sci- 
ence. Most of them employ spectral analysis, which pro- 
vides, in a very natural way (using the first non-trivial 
eigenmode) a tool for bi-partitioning [s^l as will be il- 
lustrated along this paper. By iterative applications of 
bi-partitioning more elaborated divisions into communi- 
ties or components can be achieved 

US El- Alter- 
natively, some other spectral methods employ more than 
one eigenmode leading directly to a splitting 

113,111113 

Without neglecting any of these algorithms, which can 
be applicable depending on the situation under consider- 
ation, this paper introduces yet a new method, allowing 
for a systematic analysis and detection of communities. 
It combines the following features: i) the generation of 
good results in all the tested cases, ii) it is relatively fast, 
as compared with methods providing comparable results. 



iii) it includes a way to optimize the output, as will be 
explained in what follows. 

The method proposed in this paper combines spectral 
methods with clustering techniques, and uses the concept 
of modularity in order to develop a working algorithm. 
More precisely, the main lines of the algorithm are as 
follows: spectral analysis of the Laplacian matrix allows 
us to project the network-nodes into an eigenvector- space 
of variable (tunable) dimensionality. Afterwards, a met- 
ric is introduced in various possible fashions, and then, 
finally, byapplying standard clustering techniques a den- 
drogram |6j is built up. The modularity of possible group- 
ings (sections of the dendrogram) is maximized for every 
considered dimension of the eigenvector-space and finally, 
the global maximum over all possible number of eigen- 
vectors (i.e dimensions of the space) is found. 

In the forthcoming sections we review some basic ideas 
and definitions of spectral analysis and we introduce our 
algorithm step by step. Then we apply it to different 
workbench networks, comparing its performance with 
that of other existing methods and, finally, the conclu- 
sions are presented. 

USING THE LAPLACIAN EIGENVECTORS TO 
DETECT COMMUNITIES 

Spectral analysis: Laplacian eigenvectors 

The topology of a network with N vertices can be ex- 
pressed through a symmetric N x N matrix L, the Lapla- 
cian matrix . The diagonal elements La are given by 
the degree ki of the corresponding vertex i, while off- 
diagonal elements are equal to —1 if an edge between 
the corresponding vertices i and j exists and otherwise. 
The sum of elements over every fixed row or column is, 
trivially, equal to zero. Therefore, a "constant vector" 
(with all its components taking the same value) is an 
eigenvector with eigenvalue 0. Furthermore, since the 
quadratic form 

n 

LijXiXj 

can be written as 

y ^ (a^i ~ Xj) , 

links 

which is positive semidefinite, the eigenvalues of L are 
either zero or positive The use of other matrices, 

employed to study network spectral properties has been 
recently considered in yfl . 

If the graph under analysis is connected, there is only 
one zero eigenvalue corresponding to a constant eigen- 
vector. On the contrary, for non-connected graphs (com- 
posed by m connected components) the Laplacian matrix 
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is block diagonal. Each block is the Laplacian of a sub- 
graph and it admits a constant eigenvector with eigen- 
value 0. Therefore, the Laplacian of the whole graph has 
m degenerate eigenvectors (corresponding to eigenvalue 
zero) , each of them having nonzero constant components 
for nodes in the associated subgraph and in the rest. 

If the subgraphs are not fully disconnected but, in- 
stead, a few links exist between them, the degeneration 
disappears. This leaves only one trivial eigenvector with 
eigenvalue and m — 1 approximate linear combinations 
of the old ones with slightly non-vanishing eigenvalues 
[2oll30| . As the Laplacian matrix is real-symmetric, with 
orthogonal eigenvectors, and since the first of them has 
equal components, all the other ones must have compo- 
nents whose total sum vanishes. In order to illustrate 
how these ideas can be applied to identify communi- 
ties, let us take, as a particular example, the number 
of subgraphs to be 2. In this case, the components of 
the second (first nontrivial) eigenvector are positive for 
one subgraph and have to be negative for the other, pro- 
viding a clear-cut criterion to bisect the graph [22|- If 
the two subgraphs are not very well separated, then this 
distinction between positive and negative values becomes 
fuzzier. In such cases, more elaborated criteria to decide 
how to separate into two subgroups have been profusely 
studied in the specialized literature. Some of them opti- 
mize purposely defined quantities as the normalized cut 
|34| or the conductance |33| . which are defined as func- 
tions of the number of links that exist between the two 
components and their sizes • By iterating successive 
bisections, techniques to obtain more elaborate splittings 
can be constructed 0, HE 1^ • 

An alternative strategy is to assume that if there are 
more than two weakly-connected blocks it should be 
somehow possible to find them all by inspecting the 
eigenvalue spectrum more accurately, instead of consid- 
ering just the first non-trivial eigenmode jsH . l3(il |. Let 
us explore this idea, which is the one we will exploit, in 
more detail. FigureQjL shows the components of the first 
nontrivial eigenvector of a computer-generated graph in- 
cluding 4 communities, each composed by 32 nodes (see 
forthcoming sections for details). The group structure 
is clear, even if the two communities at the bottom are 
very near to each other and some nodes could be mis- 
classified. In other examples, with a number of inter- 
group connections larger than here, the communities be- 
come more entangled, and the prospective of extracting 
clear-cut subdivisions using this type of one-dimensional 
plot worsens. This difficulty can be circumvented by tak- 
ing into account some more eigenvectors, i.e. by enlarg- 
ing the projection-space. This is illustrated in figure 
where the nodes of the same graph are plotted using the 
components on the first two nontrivial eigenvectors as 
coordinates. Simple eye-inspection shows that all com- 
munities are distinctly separated now. Actually, using 
three eigenvectors the nodes of the different groups fall 
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FIG. 1: (a) Components of the first non-trivial eigenvector for 
a computer-generated network with 4 communities (see main 
text). Two communities are clearly identified while the other 
two overlap, (b) All communities can be clearly identified 
when the components of the second eigenvector are plotted 
versus those of the first one; i.e. when the dimensionality of 
the eigenvector-space is enlarged. 



around the vertices of a (slightly distorted) tetrahedron, 
with some further improvement in inter-community sepa- 
ration. Generalizing this idea, each vertex in the graph is 
represented by a point in a D- dimensional space in which 
the coordinates are given by its projections on the first D 
nontrivial eigenvectors. 



Introducing a metric 

Aimed at turning "eye-inspection" of communities into 
a more quantitative measure, the explicit introduction of 
a metric (or similarity measure) is required. The most 
straightforward choice would be the Euclidean distance. 
However, this is not the only possibility; another one is 
to consider the angular distance, defined as the angle be- 
tween the vectors joining the origin of the _D-dimensional 
space with the two points under consideration. This pos- 
sibility is inspired by empirical observations: loosely con- 
nected nodes could be quite "Euclideanly" far from each 
other within a community, but still hdng in the same "di- 
rection" in the eigenvector-space |41|. Moreover, when 
networks are large, nodes in the same community form 
a roughly one-dimensional "bundle" (see for example fig- 
ure 3 in 0). Note also, that using angular distances 
is tantamount to normalizing the position-vectors in the 
corresponding space and then measurin g th e Euclidean 
distance, similarly to what proposed in [Sg- As will be 
shown, the angular metric generates, as a matter of fact, 
better results than the Euclidean one. 
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Cluster analysis 

Having introduced a way to measure distances in the 
eigenvector space, a method to group nodes into commu- 
nities is required. Such a method is provided by standard 
clustering techniques |l8| as, for example, hierarchical 
clustering. Starting from N clusters, composed by indi- 
vidual nodes, the two closest ones are iteratively joined 
together. In order to define cluster-to-cluster distance or 
"closeness" (for a given metric) different criteria can be 
employed, genera ting among others, the following clus- 
tering algorithms jig: 

• All possible pairs of nodes, taking one from each of 
the two clusters under examination, are considered. 
The minimum possible node-to-node distance is de- 
clared to be the cluster-to-cluster closeness. This 
leads to single-linkage clustering. 

• Proceeding as before, but replacing the "minimum 
possible node-to-node distance" between pairs by 
the "maximum" one, complete-linkage clustering is 
defined. 

• Another possibility consists in taking the average 
distance between all possible pairs. This leads to 

group-average clustering. 

• A cluster is represented by a single point located 
at its "center of mass"; the cluster-to-cluster dis- 
tance is defined as the node-to-node distance be- 
tween these two points. This leads to centroid clus- 
tering. 

All these criteria have been broadly studied and ap- 
plied. None of them can be proved to be generically more 
efficient than the others. In particular, the single-linkage 
method, being very simple, can be useful to analyze large 
data sets , an d possesses some further mathematical ad- 
vantages [13 ■ On the contrary, it has a tendency to clus- 
ter together, at a relatively low level, distant nodes linked 
successively by a series of intermediates. This is usually 
called chaining property, which constitutes in some cases 
a serious drawback. 

On the other hand, a convenient advantage of both, 
single and complete-linkage clusterings, is that only the 
ordering of the similarity measure is important: every 
other metric which produces the same ordering of dis- 
tances leads to the same results. 

The output of these algorithms can be represented by 
a hierarchical tree usually called dendrogram. The start- 
ing single-node communities are the branch-tips of such a 
tree, which are repeatedly joined until the whole network 
has been reconstructed as a single component (see, for in- 
stance, figure 2 in 0). Each level of the tree represents 
a possible splitting of the network into a set of commu- 
nities, obtained by halting the clustering process at the 



corresponding level. However, the clustering algorithm 
gives no hint about the "goodness" of such a partition. 

Modularity 

In order to quantify the validity of possible sub- 
divisions (obtained as explained above) and to optimize 
the chosen splitting, we use, following 0, 0| , the con- 
cept of modularity. It is defined as follows: given a net- 
work division, let en be the fraction of edges in the net- 
work between any two vertices in the subgroup i, and 
Oi the total fraction of edges with one vertex in group i 
(where edges "internal" to each group have weight 1 while 
inter-group hnks are weighted 1/2). The modularity, Q, 
is then defined as 

Q = ^ieu-a,f. (1) 

i 

It measures the fraction of edges that fall between com- 
munities minus the expected value of same quantity in a 
random graph with the same community division. 

The maximization of modularity has been proposed as 
a possible way to detect communities; since a full maxi- 
mization is not possible in practice (the algorithm would 
take an amount of time exponential in the number of 
nodes to explore all possible splittings) an approximate 
algorithm has been suggested |l^]. In our case, modular- 
ity measurements are simply used to find the best split- 
ting among all the possible partitions of the dendrogram 
obtained following the previous steps [l3 |. 

Other indeces quantifying the quality of splittings have 
been also proposed in the literature. Some of them are 
the "conductance" , the "performance" , and the "cover- 
age" to name but a few (see 0| and references therein for 
more details). None of these taken by itself, provides a 
fully useful criterion; they have to be combined somehow. 
It seems that the modularity is a better, more efficient, 
choice. 

Implementing a functioning algorithm 

Summarizing the ideas introduced in the previous sub- 
sections, our algorithm can be synthesized and imple- 
mented to build up a functioning algorithm as follows. 
First a few eigenvalues and eigenvectors of the network 
Laplacian matrix are computed. The question of what "a 
few" means will be tackled afterwards. Since the Lapla- 
cian is usually a sparse matrix and not all eigenpairs are 
required (that will require a time N^) the relatively fast 
Lanczos method ^3 can be employed. Nonetheless, the 
eigenvector computation is still the most computation- 
ally expensive step of the algorithm. 

For any given number D of eigenvectors (i.e. for a 
fixed dimension of the space) a similarity measure (or 
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metric) is chosen, providing a basis to apply one of the 
previously introduced clustering techniques. Typically, 
Euclidean or, better, angular distances are employed. 

Among the various hierarchical clustering methods 
available, we test single- and complete-linkage clustering 
algorithms. These two have the advantage that no new 
distances have to be calculated during cluster formation: 
when two subgroups merge to form a larger one, its dis- 
tance to any other cluster is given by the shortest (single- 
linkage) or by the largest (complete-linkage) of the dis- 
tances from the two original components. As said before, 
single-linkage performs poorly in many cases owing to 
the previously discussed "chaining" property, converting 
complete-linkage in the preferential choice. Other link- 
age methods will be explored in the future; in particular, 
group-average linkage could be suitable when studying 
tree-like graphs [i^ . 

An important difference between the way we apply 
clustering techniques and other standard applications is 
that we know in advance the underlying network struc- 
ture. Using this knowledge we implement the constraint 
that two clusters are susceptible to be merged only if there 
exists a link between them in the original network. 

At every step of the clustering process the modularity 
is computed. Once the whole dendrogram is completed, 
the splitting with the maximum modularity is chosen as 
the output for the corresponding D. 

The optimal value of D to be taken is not known a 
priori, but since the eigenvalue calculation is the slow- 
est part of the algorithm, we can repeat the hierarchical 
clustering using all possible values of D, and look up for 
the largest value of the modularity. Typically the largest- 
modularity vs. D curve exhibits a maximum whose cor- 
responding splitting provides the algorithm final output. 
If, instead, the curve keeps on growing up to the largest 
D, the number of computed eigenpairs has to be enlarged, 
in order to extend the range of the curve, until a clear-cut 
maximum is pin-pointed. 



TESTS OF THE METHOD 
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FIG. 2: Maximum modularity found by the algorithm, di- 
vided by that of the known splitting of a computer generated 
random graph (see main text); the average over 200 graphs is 
plotted as a function of fcout. 
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FIG. 3: Fraction of nodes of computer generated random 
graphs correctly identified by the algorithm, averaged over 
200 graphs, as a function of fcout. 



Artificial community networlis 

To prove the algorithm we first test it on computer- 
generated random graphs with a well-known pre- 
determined community structure 0. Each graph has 
N — 128 nodes divided into 4 communities of 32 nodes 
each. Edges between two nodes are introduced with dif- 
ferent probabilities depending on whether the two nodes 
belong to the same group or not: every node has fcin links 
on average to its fellows in the same community, and fcout 
links to the outer-world, keeping fcin -|- fcout = 16. 

In figures El and 01 we plot the modularity correspond- 
ing to the best splitting identified by the algorithm nor- 
malized by the one of the known answer, and the average 



number of correctly classified vertices, respectively. Data 
for both, Euclidean and angular measures, and both, 
single- and complete-linkage algorithms, are shown. The 
number of eigenvalues leading to the largest modularity is 
between 3 and 5 for the angular distance, and between 2 
and 4 for the Euclidean one. Let us remark that these are 
roughly equal to the number of communities and that the 
performance is much better using the angular distance. 

Summing up: on these computer-generated networks, 
our algorithm (equipped with the angular distance and 
complete-linkage) generates excellent results as compared 
with other methods (see, for instance, figure 1 in [l5j and 
figure 3 in [^). 
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TABLE I: Modularity of the best splitting of the Zachary 
club network obtained for different metrics and clustering al- 
gorithms. 




FIG. 4: Splitting of the Zachary club network. Squares and 
circles indicate the two communities observed by Zachary, col- 
ors denote the further subdivision found by our algorithm. 

Zachary karate club 

Now we consider the well-known karate club friend- 
ship network studied by Zachary which has become 
a commonly used workbench for community-finding al- 
gorithms testing IS III III in m m EH . 

Tabic m shows the maximum modularity found by the 
algorithm: the best value is again obtained using angular 
distances combined with either single or complete-linkage 
clustering. 

The best splitting is shown in figure 0] it is different 
from the "actual" breakdown of the club; i.e. the two 
groups reported by Zachary are further subdivided. Let 
us stress the presence of a single-node community (node 
12), and the fact that the modularity value of this split- 
ting is larger than Zachary's one (0.371), and larger that 
the ones found using other methods |15[ l28[ l3ll | . 

In this case single and complete-linkage give the same 
best splitting. Nevertheless, the hierarchical structure 
given by the dendrogram in the two cases are quite dif- 
ferent. Figure El shows how clusters merge after the best 
splitting is identified, as well as the modularity value cor- 
responding to each division. For complete-linkage the 
modularity value remains close to the best one until the 
whole network is merged in one community. On the other 
hand, for single-linkage it falls down rather abruptly 
right after the first merging, owing to the chaining prob- 
lem. Moreover, in the former case, the two Zachary 
communities are first reconstructed and then joined to- 
gether, while in the latter the merging proceeds differ- 



complete linkage clustering single linkage clustering 

0.5r 0.5r 




FIG. 5: Comparison between the final part of the dendrogram 
for the Zachary club, by using complete- and single-linkage 
clustering (bottom), together with the corresponding modu- 
larity values (top). 

ently. Therefore, even if the best splitting is the same one 
in both cases, complete-linkage produces a more reliable 
dendrogram, describing more accurately the hierarchical 
structure. 



Scientific collaboration networks 

In order to test the method performance on larger net- 
works we consider two scientific-collaboration networks 
first analyzed by Newman |45|. The vertices are the 
authors of the papers appeared in the cond-mat and 
hep-th archives at ArXiv.org between 1995 and 1999. 
Two authors are linked if they have co-authored a paper 
together. 

The cond-mat network contains 16726 nodes, but we 
focus on its largest connected component, which contains 
only 13861 authors. The computation of the first 1000 
eigenvectors takes about two hours on a personal com- 
puter. The modularity curve computation, calculated 
using up to 999 eigenvectors, lasts around 15 minutes. 
Results for angular distance and complete-linkage clus- 
tering are plotted in figure The largest value of the 
modularity, Q — 0.736, achieved for a splitting in 229 
communities, corresponds to a 602-dimensional space. 
Obviously, we cannot compare the final splitting with a 
"true" one, which is not defined. As the curve in figure|B| 
is rather flat in its tail, one can legitimately wonder how 
does the best splitting compare to other ones obtained 
for similar dimensionalities. This question is difficult to 
answer in a rigorous way, and will deserve further anal- 
ysis, which will eventually lead to a functional definition 
of the community-structure robustness. 

Analogously, the hep-th network has 8361 authors 
with a connected component of 5835. The largest mod- 
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FIG. 6: Maximum modularity as a function of the number 
of eigenvectors for the cond-mat (top) and hep-th (bottom) 
networks. 

ularity value, Q = 0.707, is produced by a division into 

114 communities, obtained using 416 eigenvectors. The 
computation of the first 1000 eigenvectors takes around 
30 minutes and the search of the largest modularity value 
about 8 minutes. In this case, the initial number of eigen- 
vectors could have been taken much smaller than 1000, 
without affecting the final output, with the consequent 
time saving. 

As in previous cases, the number of eigenvectors used 
to produce the best splitting is of the order of magnitude 
of the number of found communities. In these cases, com- 
parison with previous community studies is not feasible, 
as modularity measurements have not been (to the best 
of our knowledge) reported in literature. 

CONCLUSIONS 

We have introduced a new algorithm aimed at detect- 
ing community structure in complex networks in an effi- 
cient and systematic way. The method combines spectral 
techniques, cluster analysis, and the recently introduced 
concept of modularity. 

The nodes of the network are projected into a D- 
dimensional space, where D is a number of first non- 
trivial eigenvectors of the Laplacian matrix; their coordi- 
nates are the node-projections on each eigenvector. Then 
a metric (either Euclidean or angular) is introduced in 
such an eigenvector space. Once distances arc computed, 
standard hierarchical clustering techniques (as, for in- 
stance, complete-linkage clustering) are employed to gen- 
erate a dendrogram. The subdivision of this dendrogram 
giving the maximum modularity is taken as the output 
of the algorithm for a fixed D. Then, also D is allowed 
to vary (from 1 to some arbitrary, maximum value) pro- 
viding a way to maximize the modularity and enhance 



the performance of the method. 

The best results arc obtained using the angular dis- 
tance and complete-linkage clustering; however, other 
types of distances, other clustering algorithms, or even 
other means to quantify the goodness of a division could 
be used to improve the results. In this sense our algo- 
rithm is a "block-modular" one: modifications of any of 
its ingredients could possibly lead to an overall improve- 
ment. 

Even if spectral methods have been profusely used be- 
fore to analyze similar problems, we believe that our al- 
gorithm represents a step forward in studying complex- 
network communities, as it combines spectral techniques 
with (i) the novel concept of modularity, which provides 
a very adequate estimate of the quality of a given split- 
ting and (ii) a way to optimize the number of eigenmodes 
taken into consideration. 

The weakest part of the method is that the maxi- 
mum number of eigenvectors to be computed in order 
to find the one generating the maximum modularity is 
not known a priori. Being the calculation of eigenvec- 
tors the slowest part of the algorithm, what we do is to 
take a reasonable number of them and, afterwards, verify 
that the maximum-modularity curve as a function of D 
decreases at its tail; i.e. we make sure that a maximum 
of the modularity function is located. If this is not the 
case, the number of eigenvectors needs to be enlarged, 
at the cost of higher computational effort. In the ab- 
sence of a general criterion to establish the monotonicity 
of the modularity curve, the only possible way to decide 
whether the identified local maximum is the global one, 
would be to compute all possible eigenvalues. In practice, 
in all the studied cases, the best splitting is found with 
a relatively small number of eigenvectors, converting the 
algorithm in a reliable, relatively fast, and very efficient 
one. 

An open challenge would be identifying a systematic 

criterion to estimate, a priori, what is the order of mag- 
nitude of the number of eigenvalues to be computed to 
further optimize the output and eSiciency. 

We hope that this new algorithm will be employed with 

success in the search and study of communities in com- 
plex networks, and will help to uncover new interesting 
properties. 
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